DESCRIPTION 



MULTI-POINT CONFERENCE SYSTEM AND MULTI-POINT 

CONFERENCE DEVICE 

TECHNICAL FIELD 
[0001] 

The present invention relates to a multi-point communication 
system and particularly to a multi-point conference system and 
multi-point conference device. 
BACKGROUND ART 
[0002] 

In conventional multi-point conference systems, one of the 
following methods is used as a method for detecting a speaker from a 
plurality of conference terminals: 

(1) a multi-point conference device detects the speaker; 

(2) a conference terminal notifies the multi-point conference 
device that the conference terminal is a speaker. 

[0003] 

In the both methods described above, if the multi-point 
conference device identifies a new speaker and directly switches to a 
new speaker, there may happen such a case where switching of a 
speaker is done in the middle of an inter frame (an inter frame coded 
frame). As a result, conference terminals except for the speaker 
cannot perform the switching of a speaker smoothly until they receive 
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an intra frame (an intra-frame coded frame). 
[0004] 

Therefore, in order to switch the image of a speaker, a terminal 
which is a speaker is requested to transmit or re-transmit an intra 
5 frame. 
[0005] 

As an example of a conventional multi-point conference system, 
reference will be made to the system disclosed in Patent Document 1 
where a conference terminal notifies the multi-point conference device 

10 that the conference terminal will become a speaker. This system has a 
configuration comprising a plurality of conference terminals arranged 
at multiple points and a multi-point communication control unit 
(multi-point conference device). The conventional multi-point 
conference system operates as follows. 

15 [0006] 

(Al) When a multi-point conference device is requested from an 
optional conference terminal to perform speaker switching, the 
multi-point conference device requests the conference terminal that has 
requested the speaker switching to transmit or retransmit an intra frame 
20 and the other conference terminals participating in the conference to 
freeze images currently displayed by respective conference terminals 
until the reception of an intra frame transmitted by the multi-point 
conference device. 
[0007] 

25 (A2) Receiving the request to transmit an intra frame from the 
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multi-point conference device, the conference terminal transmits an 
intra frame to the multi-point conference device. Meanwhile, the 
conference terminals that have received the request to freeze images 
from the multi-point conference device freeze currently displayed 
5 image until they receive the intra frame. 
[0008] 

(A3) The multi-point conference device, on receipt of the intra 
frame from the speaker terminal, transmits an intra frame to the other 
conference terminals. The conference terminals except for the speaker, 
10 on receipt of the intra frame, release freeze and switch respective 
images using an intra frame. 
[0009] 

As described above, in the conventional multi-point conference 
system, the multi-point conference device is able to perform speaker 
15 switching by sending to a conference terminal which will become a 
speaker an intra frame transmission request and making the speaker 
conference terminal transmit an intra frame to the multi-point 
conference device. 
[0010] 

20 Further, in publications relating to multi-point conference 

systems, for instance a multi-point control unit detecting a picture 
header of video data from multiplexed data from each video-conference 
terminal, extracting only intra frame video data subjected to 
intra-frame coding and synthesizing the extracted intra frame video 

25 data is disclosed in Patent Document 2 (discussed later). Further, a 
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multi-point video-meeting control system capable of switching video 
data and audio data without causing a sense of incongruity is disclosed 
in Patent Document 3 (discussed later). In Patent Document 3, a 
structure in which intra frame data is detected from video data, intra 
5 frame detection information is generated, and switching to the video 
data of the terminal selected as the speaker is performed according to 
the generated intra frame detection information is disclosed. Further, 
in Patent Document 4 (discussed later), a multi-point communication 
system where the current speaker is accurately identified is disclosed. 
10 [0011] 

[Patent Document 1] 

Japanese Patent Kokai Publication No. JP- A-02-274084 (p. 3, 

FIG. 1) 

[Patent Document 2] 
15 Japanese Patent Kokai Publication No. JP-P200 1 -69474A (p. 3, 

FIG. 1) 

[Patent Document 3] 

Japanese Patent Kokai Publication No. JP-P2002- 1 76503 A (p. 
3, FIG. 1) 
20 [Patent Document 4] 

Japanese Patent Kokai Publication No. JP- A08-33 1 535 (pp. 2-3, FIG. 
1) 

DISCLOSURE OF THE INVENTION 
PROBLEMS TO BE SOLVED BY THE INVENTION 
25 [0012] 
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However, the systems mentioned above have the following 
problems. 
[0013] 

The first problem is that switching to the image of the speaker 
5 cannot be performed clearly when SIP terminals are used as conference 
terminals. 
[0014] 

SIP terminals use SIP (Session Initiation Protocol) defined by 
the IETF standard RFC3261 (2543) for call processing and perform 
10 real-time, bi-directional multimedia communication over an IP network. 
For media transfer, RTP (Real-time Transport Protocol) that supports 
real-time transmission and UDP (User Datagram Protocol) which has no 
re-transmission procedure. 
[0015] 

15 Therefore, the request to retransmit an image is not supported. 

As a result, when performing switching of a speaker, the multi-point 
conference device cannot request an intra frame from the SIP terminal 
and the speaker switching cannot be performed clearly because no intra 
frame is retransmitted. 

20 [0016] 

The second problem is that it takes time until the switching of a 
speaker is performed smoothly. 
[0017] 

The reason is that, since the multi-point conference device 
25 performs the switching of a speaker in the middle of an inter frame 
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transmitted by a previous speaker SIP terminal, the switching of a 
speaker cannot be performed clearly until a speaker SIP terminal 
transmits an intra frame and the non-speaker SIP terminals receive the 
intra frame. 
5 [0018] 

Accordingly, it is an object of the present invention to provide a 
multi-point conference system and multi-point conference device 
capable of switching to the image of a new speaker clearly even when 
SIP terminals are used as conference terminals. 
10 [0019] 

Another object of the present invention is to provide a 
multi-point conference system and multi-point conference device 
capable of speaker switching smoothly. 
MEANS TO SOLVE THE PROBLEMS 
15 [0020] 

In order to achieve the above objects, the outline configuration 
of the invention disclosed in the present application is as follows. 
[0021] 

The present invention is applied to a multi-point conference 
20 system where SIP (Session Initiation Protocol) terminals, which do not 
support re-transmission request function, are able to participate, and by 
having a multi-point conference device process the image data from a 
SIP terminal targeted for switching at the time of speaker switching 
and by transmitting an intra frame to the other SIP terminals 
25 participating in the conference as the first image data at the time of 
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switching, the image of the speaker displayed on the SIP terminals at 
the time of switching of an image does not get corrupted and the 
switching of a speaker can be performed smoothly. 
[0022] 

5 In a multi-point conference system in accordance with an aspect 

of the present invention that comprises a plurality of terminals and a 
multi-point conference device connected to a plurality of terminals and 
that performs a conference by transmitting/receiving image and audio, 
the multi-point conference device comprises a medium processing unit 

10 for detecting a speaker, a memory unit for holding an image from a 
terminal participating in a conference, and an image processing unit for 
decoding the image of a speaker when the medium processing unit 
detects a speaker and for re-encoding the decoded image, wherein the 
image processing unit transmits an intra frame as an image frame at the 

15 time of speaker switching when the medium processing unit detects a 
speaker. 
[0023] 

In the present invention, the image processing unit comprises a 
decoder unit for decoding the image of a speaker held in the memory 

20 unit according to the result of speaker detection by the medium 
processing unit, a reference image memory unit for holding a reference 
image obtained by having the decoder unit decode the last image of a 
speaker held in the memory unit and an encoder unit for re-encoding an 
image obtained by having the decoder unit decode an image received 

25 after a speaker is detected based on a reference image held in the 
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reference image memory unit, and encodes at least the first frame of 
the image of a speaker received after a speaker is detected as an intra 
frame. 
[0024] 

5 In a method relating to another aspect of the present invention, 

the first image is re-encoded as an intra frame when image data 
received after speaker detection is decoded/re-encoded by speaker 
switching processing means after detecting speaker switching, the 
subsequent images are re-encoded as inter frames, and the image data is 
10 transmitted to the non-speaker SIP terminals. By doing this, it is 
possible to have the non-speaker SIP terminals decode the intra frame 
at the time of speaker switching. 
MERITORIOUS EFFECT OF THE INVENTION 
[0025] 

15 According to the present invention, by transmitting an intra 

frame at the time of speaker switching, images do not get corrupted 
when switching to the image of a speaker and the switching can be 
performed smoothly. 
[0026] 

20 The reason is that, when image data received after speaker 

detection is decoded/re-encoded, the image data is transmitted to the 
non-speaker SIP terminals with the first image re-encoded as an intra 
frame and the subsequent images as inter frames in the present 
invention. 

25 [0027] 
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According to the present invention, switching to the image of a 
speaker is performed smoothly without depending on a conference 
terminal (software). 
[0028] 

5 The reason is that a conference system device performs the 

image switching processing in the present invention. 
[0029] 

According to the present invention, the switching of an image of 
a speaker can be performed smoothly even in a real-time protocol 
10 (RTP). 
[0030] 

The reason is that, since an intra frame transmission request is 
not issued in the present invention, the processing of switching of a 
speaker can be performed immediately after a speaker is detected. 
15 BRIEF DESCRIPTION OF THE DRAWINGS 
[0031] 

FIG. 1 is a diagram illustrating the system configuration of a 
first embodiment of the present invention. 

FIG. 2 is a diagram illustrating the configuration of a 
20 multi-point conference device of the first embodiment of the present 
invention. 

FIG. 3 is a flowchart for explaining the operation of processing 
of switching of a speaker in the first embodiment of the present 
invention. 

25 FIG. 4 is a flowchart for explaining the operation of processing 
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of switching of a speaker in a second embodiment of the present 
invention. 

FIG. 5 is a diagram illustrating the system configuration of a 
third embodiment of the present invention. 

FIG. 6 is a diagram illustrating the configuration of a 
multi-point conference device of the third embodiment of the present 
invention. 

FIG. 7 is a flowchart for explaining the operation of processing 
of switching of a speaker in the third embodiment of the present 
invention. 

EXPLANATIONS OF SYMBOLS 
[0032] 

1: multi-point conference device 

2-a to 2-c: SIP terminal 

3: SIP proxy server 

4: IP network 

5: 3G network 

6-a to 6-c: terminal 

11: RTP receive unit 

12: call processing unit 

13: memory 

14: conference control unit 
15: medium processing unit 
16: RTP transmission unit 
20: image processing unit 
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2 1 : decoder 

22: reference image memory 
23: encoder 
3 1 : receive unit 
5 32: transmission unit 

MOST PREFERRED MODE FOR CARRYING OUT THE INVENTION 
[0033] 

Preferred embodiments of the present invention are described in 
detail with reference to the attached drawings. FIG. 1 is a diagram 
10 illustrating the configuration of an embodiment of the present 
invention. 
[0034] 

Referring to FIG. 1, a system relating to the first embodiment of 
the present invention comprises a multi-point conference device 1, SIP 

15 terminals 2-a to 2-c, an SIP proxy server 3 5 and an IP network 4 for 
connecting these devices. In FIG. 1, the multi-point conference device 
1 transmits/receives medium data and performs speaker switching. 
The SIP terminals 2-a to 2-c transmit their image/audio data to the 
multi-point conference device 1 and output the image/audio data of a 

20 speaker received from the multi-point conference device 1 to devices. 
[0035] 

The SIP proxy server relays SIP data between the SIP terminals 
2-a to 2-c and the multi-point conference device 1. 
[0036] 

25 FIG. 2 is a diagram illustrating the configuration of the 
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multi-point conference device 1 in FIG. 1. The multi-point conference 
device 1 comprises an RTP receive unit 11, a call processing unit 12, a 
memory 13, a conference control unit 14, a medium processing unit 15, 
an RTP transmission unit 16, and an image processing unit 20. The 
5 image processing unit 20 comprises a decoder 21, a reference image 
memory 22, and an encoder 23. 
[0037] 

In FIG. 2, the RTP receive unit 11 receives an RTP/UDP/IP 
packets from the SIP terminal 2-a and extracts an RTP payload part. 
10 [0038] 

When the extracted RTP payload part is of a SIP protocol, the 
RTP payload part is supplied to the call processing unit 12, and when 
the extracted RTP payload part is medium data, the RTP payload is 
supplied to the medium processing unit 15. 
15 [0039] 

The call processing unit 12 performs call processing for the 
session and notifies the result of the call processing to the RTP 
transmission unit 16. 
[0040] 

20 Further, the call processing unit 12 notifies the IP addresses and 

medium reception ports of the conference participants to the conference 
control unit 14. 
[0041] 

For the SIP terminals 2-a to 2-c participating in the conference, 
25 the medium processing unit 15 mixes media transmitted from the other 




SIP terminals. At the same time, it detects a speaker and notifies the 
result of the speaker detection to the conference control unit 14. 
[0042] 

The conference control unit 14 manages conference participant 
5 information such as the IP addresses and medium reception ports of the 
conference participants. 
[0043] 

Further, when the conference control unit 14 is notified of the 
speaker detection result by the medium processing unit 15, the 
10 conference control unit 14 notifies the image processing unit 20 to start 
the processing for switching of a speaker. 
[0044] 

The image processing unit 20, on receipt of the notification 
from the conference control unit 14 that it should start the processing 
15 for switching of a speaker, copies the data targeted for switching from 
the memory 13 out of video RTP packet data from each SIP terminal 
accumulated in the memory 13. 
[0045] 

The memory 13 respectively accumulates the video RTP packet 
20 from each of the SIP terminals 2-a to 2-c participating in the 
conference. 
[0046] 

In the image processing unit 20, the decoder 21 decodes the 
image data of the speaker switching target copied from the memory 13. 
25 [0047] 
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The last image decoded is accumulated in the reference image 
memory 22. 
[0048] 

Then the decoder 21 directly copies video RTP data of the 
5 speaker targeted for switching from the RTP receive unit 11, performs 
decoding processing according to the reference image accumulated in 
the reference image memory 22, and supplies the decoded image to the 
encoder 23. 
[0049] 

10 The encoder 23 re-encodes the image decoded by the decoder 21 

and copies the re-encoded image data to the medium processing unit 15. 
[0050] 

The medium processing unit 15 mixes the re-encoded image 
copied from the encoder 23 so as to be transmitted to the non-speaker 
15 terminals and copies the resulting image data to the RTP transmission 
unit 16. 
[0051] 

The RTP transmission unit 16 packetizes the medium data 
received from the medium processing unit 15 into an RTP/UDP/IP 
20 packet and transmits the resulting packet to the SIP terminals 2-b and 
2-c. 
[0052] 

Further, when the call processing unit 12 requests the RTP 
transmission unit 16 to transmit SIP data, the RTP transmission unit 16 
25 packetizes SIP data into an RTP/UDP/IP packet and transmits the 



15 

resulting packet to the destination SIP terminals 2-a to 2-c. 
[0053] 

FIG. 3 is a flowchart for explaining the operation of the 
embodiment of the present invention. Next, using the flowchart shown 
5 in FIG. 3, the operation of speaker switching of the multi-point 
conference device according to the present embodiment will be 
described in detail. 
[0054] 

First, the medium processing unit 15 constantly checks whether 
10 a new speaker is detected (a step SI). 
[0055] 

When a speaker is not detected by the medium processing unit 
15, the RTP receive unit 11 checks the video RTP header of each 
conference participant except for the current speaker (a step S2). 
15 [0056] 

After the video RTP header of each conference participant 
except for the current speaker is checked and when the video RTP 
header of the SIP terminal 2-a (a conference participant) is not an intra 
frame (it means it is an inter frame), the RTP receive unit 11 copies the 
20 video RTP payload of the SIP terminal 2-a to the memory 13 (a step 
S4). 
[0057] 

When the video RTP header of the SIP terminal 2-a is an intra 
frame, the RTP receive unit 11 clears the video RTP payload that has 
25 been copied to the memory 13 for the SIP terminal 2-a (a step S3), and 
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copies the video RTP payload to the memory 13 (a step S4). 
[0058] 

When the medium processing unit 15 detects that the SIP 
terminal 2-a is a speaker ("YES" in the step SI), the image processing 
5 unit 20 supplies the video RTP payload data of the SIP terminal 2-a 
accumulated in the memory 13 to the decoder 21 (a step S5). 
[0059] 

The decoder 21 decodes the video data supplied (a step S6). It 
saves the last image frame decoded in the reference image memory 22 
10 temporarily (a step S7). 
[0060] 

During the time between the speaker detection and the saving of 
the reference image in the reference image memory 22, the RTP receive 
unit 11 checks the video RTP header from the SIP terminal 2-a, which 

15 is the speaker (a step S8). When the video RTP header of the SIP 
terminal 2-a is an intra frame, the medium processing unit 20 stops 
supplying the video RTP header of the SIP terminal 2-a to the decoder 
21, and the video RTP header of the SIP terminal 2-a is supplied to the 
medium processing unit 15, completing the processing for switching of 

20 a speaker. 
[0061] 

When the video RTP header of the SIP terminal 2-a is not an 
intra frame (it means it is an inter frame), the RTP receive unit 11 
supplies the video RTP payload of the SIP terminal 2-a to the decoder 
25 21 (a step S9). 
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[0062] 

The decoder 21 starts decoding the video RTP payload according 
to the image frame temporarily saved in the reference image memory 22 
(a step S10). 
5 [0063] 

The decoded image data is supplied to the encoder 23 and 
re-encoded (a step Sll). 
[0064] 

When the encoder 23 re-encodes the decoded image data, it 
10 encodes the first frame as an intra frame and subsequent frames as inter 
frames. The re-encoded image data is copied to the medium 
processing unit 15 (a step S12). 
[0065] 

The medium processing unit 15 copies entire audio RTP payloads 
15 of conference participants from the RTP receive unit 11 and mixes them. 
The mixed audio RTP payloads and the re-encoded image data are 
copied to the RTP transmission unit 16. The RTP transmission unit 16 
packetizes the image and audio data received from the medium 
processing unit 15 into an RTP/UDP/IP packet and transmits the 
20 resulting packet to the non-speaker SIP terminals 2-b and 2-c (a step 
S13). 
[0066] 

The RTP receive unit 11 supplies an image frame on to the 
decoder 21 and then checks the received video RTP header of the SIP 
25 terminal 2-a, which is the speaker (a step S8). 
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[0067] 

When the video RTP header of the speaker is not an intra frame 
(it means that it is an inter frame), the processing for switching of a 
speaker (the steps S9 to SI 3) continues. On the other hand, when it is 
5 an intra frame, supplying of the video RTP payload of the SIP terminal 
2-a to the decoder 21 is stopped and the video RTP payload of the SIP 
terminal 2-a is supplied to the medium processing unit 15, completing 
the processing for switching of a speaker. 
[0068] 

10 Next, a second embodiment of the present invention will be 

described in detail with reference to the drawings. 
[0069] 

The configuration of the second embodiment of the present 
invention is the same as that of the first embodiment described above, 

15 however, the second embodiment differs from the first embodiment in 
the sense that, by an instruction from the medium processing unit 15, 
the RTP transmission unit 16 controls to transmit a SIP method (for 
instance INFO method) that includes information indicating an intra 
frame transmission request to the SIP terminal 2-a (the speaker) when 

20 the medium processing unit 15 detects a speaker. 
[0070] 

When detecting a speaker, the medium processing unit 15 
notifies the speaker detection result to the conference control unit 14. 
At the same time, the medium processing unit 15 notifies the RTP 
25 transmission unit 16 to transmit the intra frame transmission request to 
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the speaker. 
[0071] 

The RTP transmission unit 16 transmits an INFO method that 
includes information indicating an intra frame transmission request to 
5 the SIP terminal 2-a, which is the speaker. 
[0072] 

After receiving the INFO method, the SIP terminal 2-a encodes 
the image frame to transmit the encoded frame as an intra frame which 
is transmitted next according to the request information and transmits 
10 the image packet, which is an intra frame, to the multi-point conference 
device 1. The operation hereafter is the same as that of the first 
embodiment described above. 
[0073] 

FIG. 4 is a flowchart for explaining the operation of the second 
15 embodiment of the present invention. The processing for performing 
speaker switching by the multi-point conference device 1 of the present 
embodiment will be described in detail with reference to FIG. 4. 
[0074] 

Since steps S21, S22, S23 and S24 of the flowchart in FIG. 4 are 
20 identical to the steps SI, S2, S3 and S4 of the flowchart in FIG. 3, the 
explanations of these will be omitted. 
[0075] 

When the SIP terminal 2-a is detected as a speaker (the step 
S21), the RTP transmission unit 16 is notified to transmit the intra 
25 frame transmission request to the speaker, and the RTP transmission 
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unit 16 transmits the INFO method that includes the intra frame 
transmission request information to the SIP terminal 2-a which is the 
speaker (a step S25). 
[0076] 

5 Since steps S26 to 34 (the procedure after the INFO method is 

transmitted to the speaker SIP terminal 2-a) are the same as the steps 
S5 to S13 of the first embodiment shown in FIG. 3, the explanations of 
these will be omitted. 
[0077] 

10 As described, the request for intra frame transmission is made 

immediately after a speaker is detected in the present embodiment and 
hence an image of a speaker can be switched more smoothly than in the 
first embodiment in which the processing for switching of a speaker 
has to be continued until the speaker conference terminal transmits an 

1 5 intra frame. 
[0078] 

In the present embodiment, the INFO method is used as a SIP 
method, however, other SIP methods may be used. Further, a SIP 
method is used for requesting intra frame transmission in the present 
20 embodiment, however, other commands requesting intra frame 
transmission may be used. 
[0079] 

Further, the conference terminals transmit an intra frame when 
they receive a intra frame transmission request in the present 
25 embodiment, however, they may not respond to the intra frame 
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transmission request. 
[0080] 

Next, a third embodiment of the present invention will be 
described in detail with reference to the drawings. 
5 [0081] 

FIGS. 5 and 6 show the configuration of the third embodiment of 
the present invention. In addition to the system configuration of the 
first embodiment shown in FIG. 1, the present embodiment comprises a 
3G network 5 and terminals 6-a to 6-c. Further, in the present 
10 embodiment, the RTP receive unit 11 and the RTP transmission unit 16 
in the function block of the multi-point conference device of the first 
embodiment shown in FIG. 2 are replaced by a receive unit 31 and a 
transmission unit 32 respectively. 
[0082] 

15 In FIG. 5, the terminals 6-a to 6-c are the third-generation 

telephones (3G-324M) capable of providing image and audio 
communication, and the 3G network 5 is a network to which the 
terminals 6-a to 6-c are connected. 
[0083] 

20 The multi-point conference device 1 performs a multi-point 

conference between the SIP terminals 2-a to 2-c and the terminals 6-a 
to 6-c by carrying out heterogeneous network connection between the 
IP network 4 and the 3G network 5. 
[0084] 

25 In FIG. 6, the receive unit 31 receives data from the terminal 6-a 
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and extracts data. When the extracted data is call signaling Q.931 
data, the data is supplied to the call processing unit 12, and when the 
extracted data is medium data, the data is supplied to the medium 
processing unit 15. 
5 [0085] 

The call processing unit 12 performs call connection processing 
and notifies the call processing result to the transmission unit 32. 
[0086] 

Further, the call processing unit 12 notifies the telephone 
10 number and user ID of the terminal 6-a (a conference participant) to the 
conference control unit 14. 
[0087] 

For the SIP terminals 2-a to 2-c and the terminals 6-b and 6-c 
participating in the conference, the medium processing unit 15 mixes 

15 media transmitted from the other conference terminals. When the 
image/audio codecs used by the SIP terminals 2-a to 2-c and the 
terminals 6-a to 6-c are different, the medium data are decoded by the 
medium processing unit 15, re-encoded in accordance with the codec of 
each conference terminal, and then mixed. 

20 [0088] 

At the same time, the medium processing unit 15 detects a 
speaker. Since the operations from the notification to the conference 
control unit 14 of the speaker detection result by the medium 
processing unit 15 to the accumulation of image data from each 
25 conference terminal by the memory 13 are identical to those of the first 
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embodiment shown in FIG. 2, the explanations will be omitted. 
[0089] 

In the image processing unit 20, the decoder 21 decode the 
image data of the speaker targeted for switching copied from the 
5 memory 13 in accordance with the image codec used by the speaker 
terminal. 
[0090] 

Since the operation of accumulating the last image decoded in 
the reference image memory 22 and supplying the decoded image to the 
10 encoder 23 is the same as that of the first embodiment shown in FIG. 2, 
the explanation of it will be omitted. 
[0091] 

When the encoder 23 re-encodes the image decoded by the 
decoder 21, it re-encodes in accordance with image codecs of 

15 respective non-speaker conference terminals and copies the re-encoded 
image data to the medium processing unit 15. The medium processing 
unit 15 mixes the image data re-encoded in accordance with image 
codecs of the non-speaker conference terminals and copied from the 
encoder 23 for being transmitted to the corresponding conference 

20 terminals, and copies the mixed image data to the transmission unit 32. 
[0092] 

The transmission unit 32 transmits the medium data received 
from the medium processing unit 15 to the SIP terminals 2-a to 2-c and 
the terminals 6-b and 6-c according to the media formats of the IP 
25 network 4 and the 3G network 5. Further, the transmission unit 32 
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transmits the medium data to the target terminals 6-a to 6-c according 
to the call processing result notified by the call processing unit 12 via 
call processing Q.931. 
[0093] 

5 Since the operation performed for the SIP terminals 2-a to 2-c is 

the same as that of the first embodiment described above (refer to FIG. 
2), the explanation of it will be omitted. 
[0094] 

Next, referring to a flowchart in FIG. 7, the processing for 
10 switching of a speaker by the multi-point conference device 1 of the 
present embodiment will be described in detail. 
[0095] 

First, the medium processing unit 15 constantly checks whether 
a new speaker is detected (a step S41). 
15 [0096] 

When no speaker is detected, the receive unit 31 checks the 
image data of each conference participant except for the current 
speaker (a step S42). After the image data of each conference 
participant except for the current speaker is checked and when the 
20 image data of the terminal 6-a (a conference participant) is not an intra 
frame (it means it is an inter frame), the image data of the terminal 6-a 
is copied to the memory 13 (a step S44). 
[0097] 

When the image data of the terminal 6-a is an intra frame, the 
25 image data that has been copied to the memory 13 for the terminal 6-a 
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is cleared (a step S43) ? and the new image data is copied to the memory 

13 (a step S44). 

[0098] 

When the terminal 6-a is detected as a speaker (the step S41), 
5 the image data of the terminal 6-a accumulated in the memory 13 is 
supplied to the decoder 21 (a step S45). 
[0099] 

The decoder 21 decodes the data using the image codec of the 
supplied image data (a step S46). The last image frame decoded is 
10 temporarily saved in the reference image memory 22 (a step S47). 
[0100] 

During the time between the speaker detection and the saving of 
the reference image in the reference image memory 22, the receive unit 
31 checks the image data from the speaker terminal 6-a (a step S48). 

15 When the image data of the terminal 6-a is an intra frame, the 
supplying of the image data of the terminal 6-a to the decoder 21 is 
stopped, and the image data of the terminal 6-a is supplied to the 
medium processing unit 15, completing the processing for switching of 
a speaker. 

20 [0101] 

When the image data of the terminal 6-a is not an intra frame (it 
means it is an inter frame), the receive unit 31 supplies the image data 
of the terminal 6-a to the decoder 21 (a step S49), and the decoder 21 
starts decoding it using the image codec of the image data supplied to 
25 the decoder 21 according to the image frame temporarily saved in the 
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reference image memory 22 (a step S50). 
[0102] 

The decoded image data is supplied to the encoder 23 and it is 
re-encoded using the image codecs of the non-speaker conference 
5 terminals (a step S51). 
[0103] 

When the encoder 23 re-encodes the decoded image data, it 
encodes the first frame as an intra frame and subsequent frames as inter 
frames. The re-encoded image data is copied to the medium 
10 processing unit 15 (the step S12). 
[0104] 

The medium processing unit 15 copies the audio data of the 
conference participants from the receive unit 31, decodes it, re-encodes 
it using audio codecs of the non-speaker conference terminals, and 

15 mixes it. The audio data mixed using the codec of each conference 
terminal and the re-encoded image data are copied to the transmission 
unit 32. The transmission unit 32 converts the image and audio data 
received from the medium processing unit 15 into formats in which 
they can be transmitted over the IP network 4 and the 3G network 5, 

20 and transmits them to the non-speaker conference terminals, namely 
SIP terminals 2-a to 2-c and the terminals 6-b and 6-c (a step S53). 
[0105] 

As described above, in the present embodiment, a multi-point 
conference can be realized between the SIP terminals and the 3G 
25 terminals by the multi-point conference device 1 capable of 
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interconnecting the heterogeneous network of the IP network 4 and the 

3G network 5. 

[0106] 

In the present embodiment, the 3G network is connected to the 
5 IP network as a different kind of network, however, an ISDN network, 
Internet service provider network (ISP network), or a public switched 
telephone network (PSTN) may be used. 
[0107] 

Further, in the present embodiment, the intra frame transmission 
10 request is not made immediately after a speaker is detected, however, it 
may be done so as in the second embodiment. For instance, the intra 
frame transmission request may be made by having the multi-point 
conference device transmit a videoFastUpdate command defined by the 
ITU-T recommendation H.245 in the case of a 3G network terminal. 
15 [0108] 

It should be noted that other objects, features and aspects of the 
present invention will become apparent in the entire disclosure and that 
modifications may be done without departing the gist and scope of the 
present invention as disclosed herein and claimed as appended 
20 herewith. 

Also it should be noted that any combination of the disclosed 
and/or claimed elements, matters and/or items may fall under the 
modifications aforementioned. 



